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Abstract 

We propose two scheduling algorithms that seek to optimize the perceptual quality of scalably coded 
videos transmitted over slow fading wireless channels. The first scheduling algorithm is derived from a Markov 
Decision Process (MDP) formulation developed here. We model the dynamics of the channel as a Markov chain 
and reduce the problem of dynamic video scheduling to a tractable Markov decision problem over a finite 
state space. Based on the MDP formulation, a near-optimal scheduling policy is computed that maximizes 
an objective proxy of video quality, the time-average Multi-Scale Structural SIMilarity (MS-SSIM) index. 
Using sights token from the development of the optimal MDP-based scheduling policy, the second proposed 
scheduling algorithm is an online scheduling method that only requires only easily measurable knowledge of 
the channel dynamics, and is thus viable in practice. Simulation results show that the performance of both 
scheduling algorithms is close to a performance upper bound also derived in this paper. 

Index Terms 

Videos transport, Scheduling algorithm, Wireless communication, Image quality. 

I. Introduction 

Video transmission over wireless channels is a challenging task. The capacity of wireless channels 
varies over time, making the delivery of real-time video challenging due to tight delay constraints. 
For example, if the coherence time of the channel is comparable to the delay constraint, then the 
time-diversity of the channel cannot be exploited. Traditional channel coding methods cannot provide 
a graceful visual quality degradation of the received video in the presence of deep fades. Adaptive 
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transmission techniques such as multi-layer scheduling and link-adaptation can be employed to adjust 
rates to changing channel conditions. Furthermore, video packets are structured. Due to the nature of 
predictive video coding algorithms, a video frame can be decoded only when its predictors have been 
received at the receiver. Hence, the prediction structure of the video codec enforces a partial order on 
the transmissions of the video packets. 

Scalable video coding (SVC) is one approach to enable flexible video transmission over channels 
with varying throughput [[TJ, [|2j. An SVC video encoder produces a layered video stream that contains 
a base layer and several enhancement layers. If the throughput is low, the transmitter can choose to 
transmit the base layer only, which provides a moderate, but acceptable, degree of visual quality at the 
receiver. If the channel conditions improve, the transmitter can transmit one, or more, enhancement 
layers to further improve the visual quality. Conceptually, SVC provides a means to adapt the data 
rate for wireless video transmission. The wireless transmitter can adapt the data rate by selectively 
scheduling video data associated with various layers for transmission rather than transcoding the video 
sequence into a different rate. 

Designing scalable video scheduling algorithms for wireless channels is a complex task. The schedul- 
ing policy depends, not only on the channel conditions, but also, on the receiver buffer state. For 
example, if the receiver has successfully buffered base layer data over many frames, the scheduler could 
choose to transmit some enhancement layer data to improve the video quality even if the throughput is 
low. At any time, the scheduling decision will determine the receiver buffer state which, in turn, affects 
the future scheduling decisions. Therefore, adaptive video data scheduling is a sequential decision 
problem. The most natural way to address such problems is to model the dynamics of the channel as a 
finite state Markov chain and to employ a Markov decision process (MDP)-based formulation to study 
scheduling methods. Directly determining an optimal scheduling policy using an MDP formulation is 



not possible, however, because the system state space is infinitely large (see Section. III-A). Moreover, 
in a practical wireless network, a model for the dynamics of the channel states is not typically available, 
which limits the applicability of this approach. 

A. Contributions 

The objective of this paper is to leverage the MDP framework to develop practical scheduling 
algorithms and optimize the receiver perceptual video quality for scalable video transmission over 
wireless channels. First, we propose a tractable MDP formulation based on a reasonable approximation 
of the state space. Near optimal scheduling policies can be derived from this MDP formulation. Then, 
we propose a scheduling algorithm that substantially simplifies the MDP-based scheduling policy as 
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it requires only limited information regarding the channel state dynamics. We prove an upper bound 
on the achievable video quality of all possible scheduling algorithms. Simulation results show that, 
under different channel conditions, the performance of proposed scheduling algorithms is indeed very 
close to the upper bound. 

Our contributions made in this paper are: 

I) An MDP formulation is proposed to facilitate the design of adaptive scheduling policies. Typical 
mobile users usually have an application layer storage space of several gigabytes. Thus, the buffer 
size can be effectively regarded as infinite. Because the performance of the scheduling policy 
depends on the receiver buffer state, the policy needs to be optimized over an infinitely large 
state space and the scheduling problem is intractable. In this paper, by applying reasonable 
restrictions on the set of scheduling policies considered in our MDP formulation, we prove that 
optimizing the transmission policy is equivalent to solving a semi-Markov decision problem on 



a finite state set (see Section. III). Based on this result, near-optimal scheduling policies can be 
derived using the proposed MDP formulation. 

2) A simple and near-optimal scheduling algorithm is proposed. In most cases, models for channel 
dynamics are not available. By simplifying the channel model and the scheduling decision of 
the MDP formulation, we devise an on-line scheduling algorithm which, unlike the MDP-based 
policy, only requires limited measurable knowledge of the channel dynamics. Simulation results 
show that the proposed on-line algorithm performs nearly as well as the MDP-based scheduling 
policy. 

3) Performance optimality is justified. To assess the performance of the proposed scheduling al- 
gorithms, an upper bound on the achievable video quality for adaptive scheduling is proved. 
Simulation results show that both the MDP-based scheduling policy and the proposed on-line 
scheduling policy perform close to the upper bound. 

B. Related Work 

Adaptive video data scheduling is an important topic of research [[3j-[|9). In [|3j, adaptive video 
transmission over a packet erasure channel was studied by modeling the buffer state as a controlled 
Markov chain. In [4], an MDP-based scheduling algorithm was proposed for video transmission over 
packet loss networks. This work was further extended for wireless video streaming in [5]. The wireless 
channel was modeled as a binary symmetric channel. This channel model can only be justified for 
fast fading channels, where the coherence time is much less than the delay constraint. In that case, 
interleaving can be applied without violating the delay constraint, and the channel will appear as an i.i.d 
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channel. For slow fading channels such as those considered here, the bit error rate cannot be modeled 
as a constant. In [|6j, (7J, and [[8), reinforcement learning frameworks were proposed for wireless video 
transmission. Their proposed algorithms were based on MDP using a discounted-reward maximization 
formulation. The transmitter learns the characteristics of the channel and the video sequence during the 
transmission process. The scheduling policy is updated according to the learned characteristics. In our 
previous work [|9j, an infinite-horizon average-reward maximization MDP formulation was proposed. 
The channel and source characteristics, unlike in this paper, were assumed to be known. 

The most closely related prior work is [|5j _ Q anc ^ l@ which focus on single user scalable video 
transmission over wireless channels. Our work contrasts with these as follows: 

• Channel Characteristics. We focus on slow-fading wireless channels experienced by pedestrian 
users. In the channel model of [[5j, the bit error probability of the channel was assumed constant. 
This assumption can only be justified for fast fading channels, where the channel coherence time is 
much less than the delay constraint in video applications. In that case, interleaving can be applied 
without violating the delay constraint, and the channel will appear to have i.i.d. bit errors. For slow 
fading channels, where the coherence time is much longer, it is impossible to apply interleaving 
over many coherence periods due to the delay constraint. In this case, i.i.d. models are no longer 
suitable because they do not capture information regarding channel variations. By contrast, the 
algorithm proposed in this paper explicitly considers channel state variation in scheduling. 

• Optimization objective. Most of the existing MDP-based scheduling algorithms are based on a 
utility function as the optimization objective. The utility function is usually written as a weighted 
sum of distortion reductions incurred once decoding the received data units. The weights assigned 
to different data units, to some extent, reflect their importance and inter-dependency, but are 
heuristically chosen. The resulting utility function cannot accurately indicate the visual quality of 
played out frames. Here, instead of optimizing a utility function, we directly optimize the visual 
quality of the video frame played out in each frame slot. The visual quality is measured via the 
MS-SSIM index which correlates well with human objective judgments |10j. 

• Non-availability of channel state dynamics. In a practical wireless video transmission application, 
models for the dynamics of the channel state are typically unavailable. To address this problem, 
a reinforcement learning algorithm can be employed to learn a good policy from making wrong 
scheduling actions. Video quality, however, will be degraded during the learning period. We 
propose an adaptive alternative to such reinforcement learning methods, that only uses the channel 
coherence time and current channel throughput which are easy to measure in practice. The 



5 



performance of the proposed algorithm is very close to a derived performance upper bound. 

C. Organization of paper 

This paper is organized as follows: The system model is introduced in Section [TTJ. The assumptions 
we make about the video codec and the rate-quality model are described in Section [XT] also. In Section 
[TITl the MDP formulation and the performance upper bound are proposed. The near-optimal on-line 
scheduling algorithm is introduced and validated by simulations in Section [IVj Section [V] concludes 
the paper. 

II. System Model 

We first describe the wireless video system to be considered. Then, we present our video codec 
configuration and introduce the rate-quality model. 

We briefly introduce some notations used in the paper. A and a are examples of a matrix and a 
vector, respectively. A is a set. |^4| is the cardinality of set A. [■] is the ceiling function. P(-) is the 
probability measure and E[-] is the expectation. N = {0, 1, 2, • ■ ■ } is the set of non-negative integer 
numbers. The other frequently used notations are summarized in Table [TJ 

A. System Overview 

We consider a time-slotted system that transmits scalable videos over a slow fading wireless channel. 
The video sequence is encoded with a quality-scalable video encoder and is stored in a video server. 
The video server transmits video data to a mobile user via a wireless transmitter. Each slot, the server 
sends some video data upon request of a scheduler at the wireless transmitter. This data is packetized 
at the wireless transmitter for physical layer transmission. The scheduler operates according to a policy 
which maps the channel and receiver buffer state to the scheduling action (see Fig. [T]). 

We assume that the link between the video server and the wireless transmitter is not the bottleneck 
for transmission to the mobile. Thus, from the perspective of the wireless transmitter, the whole video 
sequence is available for transmission. We also assume that the physical layer channel state information 
is available at the transmitter and that the modulation and coding scheme (MCS) is determined by a 
given physical layer link- adaptation policy. 



B. Video Codec Configuration 

We assume that the video sequence is encoded by an H.264/SVC-compatible scalable video encoder. 
The duration of each frame AT is called a frame slot. The video frames are uniformly partitioned into 
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Groups of Pictures (GOPs). Every GOP has F GOP frames. The first frame in a GOP is an I frame 
while the other frames are P frames. Every frame is encoded into L layers. The first layer is the base 
layer; the other layers are enhancement layers. Every enhancement layer of a frame is predictively 
encoded using the lower layers of the frame. The base layer of a P frame is predictively encoded 
using the base layer of its preceding frame. The base layer of an / frame is encoded independently 
(see Fig. 

Each frame has a playout deadline at the receiver. In the following, frames whose deadlines have 
expired are called expired frames, otherwise they are said to be active frames. The first active frame 
is called the "current frame". At any point in time, frames are indexed relative to the current frame 
as shown in Fig. |2j The video data in the £ th layer of the f th frame is called the (/, £) th video data 
unit. 

We adopt the prediction structure in Fig. [2] rather than the "Hierarchical B" structure because no 
structural delay is introduced |TJ. In the "Hierarchical B" prediction structure, the encoding order 
differs from the display order, thus the transmission of a frame must be delayed until all necessary 
predictors are received. Besides, because the enhancement layers are used to predict other frames in 
the "Hierarchical B" structure, dropped enhancement layers can give rise to error propagation and 
unpredictable visual quality degradation. At the possible cost of lower compression efficiency, the 
prediction structure that we use eliminates error propagation arising from enhancement layer losses, 
since there is no inter-frame prediction among enhancement layers. 

C. Rate-Quality Model 

Let Zf be the amount of received data for the f th frame. The rate-quality function qj(zf) captures 
the quality of the frame when it is decoded. Let be the amount of data in the (f,£) th data unit 
and q^f t t) be the visual quality increment if the I th layer is correctly received, given all its predictors 
have also been received. As illustrated in Fig. [3} since a data unit can be decoded only when all its 
associated data has been received, q/(zf) is a piecewise constant and right-continuous function with 
jumps at Zf = J2T=i u (f/)> = 1, 2, • • • , L. Thus q^^ and characterize qf(zf). 

In a real video sequence, for a given layer I, the rate-quality characteristics q(f,e) an d ^(f,i) var y 
across frames. In this paper, we adopt a simple model to approximate qj(zf). Let n be the number 
of frames in a video sequence. Since for each layer, qtfj) is almost the same for all frames, we use 
q l = l/nX)"=i Q(f,t) as an estimate for visual quality increment if the £ th layer is correctly received. 
We also assume that ui(f,i) is almost the same for I frames and P frames, respectively. Thus, let 
uo\ and wf be the average values of across the video for / frames and P frames, respectively. 
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Our rate-quality models q I (zf) and q p (zf) for / frames and P frames are respectively constructed as 
piecewise constant functions with jumps at Zf = Yl 7 ^=i UJ e an d Zf = YlT=i u i> = 1, 2, • • • , L. 

Conventional image quality measures such as the PSNR reflect absolute signal fidelity but without 
accounting for perceptual visual quality. Recently, a variety of models that accurately predict perceptual 
video quality have been proposed pT|-p3|. In our formulation, we adopt the MS-SSIM index as the 
visual quality measure JTTJ, since it has been shown to correlate quite well with perceptual visual 



quality and it is of reasonable computational complexity [10|. 

The MS-SSIM index of a video sequence ranges from to 1. The larger the index, the better the 
quality. In our rate-quality model, the marginal quality increment q l is measured using the MS-SSIM 
index. Larger values of q l mean a larger marginal improvement can be achieved by transmitting the 
I th layer data units. 



D. Streaming Setup 

We focus on scheduling for a slow fading channel. By slow fading, we mean that the coherence time 
of the channel is less than the duration of a GOP and larger than the duration of a frame. Assuming 
the mobile users are moving in a 1 .5m/s walking speed and the carrier frequency is 2GHz, the Doppler 
spread is about 10Hz. The coherence time is about 100ms. A typical GOP duration is about 1 second 
and a frame slot is about 30ms. Hence, for pedestrian video users, wireless channels are slow fading. 

As the channel state is stable during each frame slot, the scheduling decision is made on a frame- 
by-frame basis. At the beginning of each frame slot, a frame is played out, and video data units are 
scheduled for transmission. The scheduling action is defined as a set of ordered video data units 

U = {{f 1 ,£ 1 ),(Mr-- ,(f\u\,£\u\)}- (1) 

When scheduling action U is taken, the associated data units are transmitted sequentially. Each 
scheduled data unit is packetized into physical layer packets and each packet is repeatedly transmitted, 
i.e., if errors occur, until acknowledged. 

III. Markov Decision Process-Based Model 

In this section, we propose an MDP-based model to determine the near-optimal scheduling policy. 
To that end we describe the scheduler's state space and the policies to be considered. We then show 
how to reduce the scheduling problem to a finite-state Markov decision problem using reasonable 
approximations. To validate the optimality of the MDP-based scheduling policies, we develop a 
performance upper bound at the end of this section. 
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A. Scheduling Policy and State Space 

Considering all possible scheduling actions makes defining the scheduling policy and representing 
the buffer state unmanageably complex. On one hand, to capture the buffer state, the frame index and 
the layer index of each received data unit need to be recorded. If we assume an infinite playback buffer, 
the number of received data units is not bounded. So we cannot represent all possible buffer states 
using a finite-dimensional space. On the other hand, we note that not all possible scheduling policies 
need to be considered. For example, video data units should not be transmitted before their predictors. 
If their predictors are not received before their playout deadlines, these units are undecodable and 
useless. Thus we need only consider those scheduling strategies that are not dominated and have 
potential to achieve good performance. 

Specifically, we consider scheduling policies under the following assumptions: 

Assumption 1: The scheduler always schedules a data unit for transmission after its predictors. 

Assumption 2: The amount of video data scheduled on a slot exceeds the amount of data which 
can be transmitted in the slot. 

Assumption 3: Let W denote the set of data units associated with the first W active frames. We 
assume the scheduler first sends the video data in W. Then, if all the data in W has been received, 
the policy greedily schedules as many enhancement layers as possible, i.e., starts transmitting the next 
frame only when all the layers of preceding frame have been received. 

Assumption 4: The scheduler never schedules enhancement layer data units for future P frames if 
those for earlier P frames in the same GOP have not been sent. 

Assumption [T] ensures that the transmission order is compatible with the prediction order given in 
Section II-B[ since a data unit can be decoded only when its predictors are received. Assumption [2] 
ensures the transmitter will not be idle during a slot. Assumption [3] stems from the fact that, when 
many frames are buffered at the receiver, the scheduler can transmit more enhancement layers because 



there is sufficient time before the frames are played out. As will be discussed in Section. III-C this 
assumption helps to simplify the policy optimization problem. It should be noted that policies under 
Assumption [3] are different from the sliding window policies defined in |4j. Indeed, our scheduling 
policy allows the transmitter to transmit data units outside the window. With Assumption [4] in effect, at 
any time and for all the P frames in a GOP, the scheduler does not sacrifice the quality of the frames 
that will be displayed sooner for the frames to be displayed later by transmitting more enhancement 
layer data for the latter. Because the optimization objective is the time-average MS-SSEVI index, the 
rate-quality function of each P frame is assumed to be the same and thus their qualities are equally 



9 



important. Transmitting more enhancement layer data for later frames does not help to improve the 
time-average MS-SSIM value. 

Note that, although the P frames within a GOP are equally important in terms of contribution to the 
time-average MS-SSIM index, the / and P frames in different GOPs are not. For example, when the 
channel throughput is very low, it may be beneficial to sacrifice P frames in the current GOP in order 
to transmit the base layer for an / frame in the next GOP, because an / frame contains much more 
data than a P frame and the loss of an / frame would cause severe decoding failures throughout the 
next GoP. To differentiate the importance of current and future GOPs, we partition the data units of 
the active frames into three sets: X, X pre and X post . The set X contains the data units of the first active 
/ frame, X prc contains data units preceeding the first active / frame, and X post contains the remaining 
active data units (see Fig. [4]). 

We define the overall buffer state space V via three sets V 1 , V pre and y post , where V = V 1 x V pre x 

ypost 

V 1 : The state of X is defined as v 7 = (f^b 1 ), where f 1 E {1, • • • ,F GO p} is the number of 
frames until the first active / frame and b 1 is the number of the received data units of X, 
thus V 1 = {!,■■■ ,F OOP }x{l,.-. ,L}. 

ypre. when Constraint [4] is enforced, the number of data units received in X prc must be non- 
increasing in the frame index. Hence, we only need record the total number of received data 
units for each layer. We define the buffer state space for X^ 6 as a L-dimensional vector 
v pre _ ^P re ; 5P re ; . . . 6 pre ), where 6^ rc is the number of the received data units in i th layer 
for X pre , thus V pre = {0, 1, ■ • • , F GOP - 1} L . 

ypost. ^ s w j t j 1 yprc^ we define tne buffer state space of J? * 31 as a L-dimensional vector v post = 
(6 post , 6 post , • • • , 6 post ), where 6f st is the number of the received data units in the £ th layer of 
jpost Because the receiver buffer size is assumed to be large, i.e., essentially infinite, b^ ost 
is unbounded. Thus V post = N L , where N = {0, 1, • • • , oo}. 



In JT6J and [ [171 , it is shown that a first-order finite state Markov chain (FSMC) can be utilized 
to describe the first-order channel state transition probabilities for Rayleigh fading channels. First- 
order FSMC models have also been validated in [ fT8| and p9| by wireless channel measurements in 
urban areas. In our MDP-based model, we employ a first-order FSMC to describe the dynamics of 
the channel state. 

At the physical layer, the transmission bit rate x is determined by the modulation and coding 
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scheme (MCS) and the packet error rate y is determined by both the channel state and the MCSQ We 
assume the chosen MCS is a function of the channel state under a given link adaptation mechanism. 
For example, the physical layer can always choose the MSC which maximizes channel throughput 
x(l — y). Thus, there is a one-to-one mapping from channel state to the tuple (x,y). We define the 
channel state as c = (x,y). Due to the Markov property of the channel state, channel state can also 
be modeled by an FSMC. The channel state space is C = {c 1 , ...,d c l}, where c* = (x\y l ) is the i th 
channel state. The state transition matrix P c is a \C\ x \C\ matrix with entry = P(c J '|c l ) being the 
transition probability from state c l to c 7 . 

The system state space S is defined as the product of the channel state space C and the buffer state 
space V. For each state s G S, we define a feasible control set U s that contains all the scheduling actions 
(see Equation ([T])) complying with all the four assumptions. The state s contains all the information 
about the receiver buffer and the channel. The transmitter must decide which action in U s to take in 
order to maximize the time-average MS-SSIM index value. We define the scheduling policy /x(-) as 
the mapping from the system state s to an action in U s . In the following sections, we show how to 
optimize the scheduling policy /i(-). 



B. Optimization Objective 

Since the channel condition is modeled as a random process, we denote by (C t , V t , S t ) tm the 
random processes modeling channel, buffer and system state, respectively. Accordingly, we denote by 
(cj, v t , s t ) teN their realizations. At the beginning of each time slot t, the first frame in the window is 
played out and the MS-SSIM index is 

L 

q(S t ) = J2l txl e(St), (2) 

1=1 

where le(St) is the indicator that the £ th layer of the displayed frame is available in state St- The 
quantity q e is the marginal video quality improvement if, in addition to layers 1,2, •• • 1, layer 



is available (see Section II-C). Our aim is to find an optimal policy //*(•) which maximizes the 



time-average MS-SSIM index, i.e., 



Jf, = lim E M 

n— >oo 



1 I 



t=o 



'Here, bit rate x is the number of bits transmitted in a time slot AT, i.e., the transmission rate normalized with slot duration AT. 
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C. Finite State Problem Formulation 

Since the state space y post is infinite, the state space S is also infinite. Optimizing the scheduling 
policy over this infinite- state space is intractable. With Assumption [3j the scheduling policy is actually 
fixed when all the data in window W is received. We only need to determine the optimal scheduling 
policy for states where some of the video data in the window has not been received, which is a finite 
state set. The system state, however, still evolves in the infinite state space S. In the following, we 
show how to simplify this infinite state space problem to a finite-state problem. 

We define the set of states where some of the video data in W has not been received as follows: 

S w = {s|s G S, 0(s) C W} , (4) 

where 0(s) is the set of buffered active video data units when the state is s. We define another subset 
of S as follows 

S w ={s\seS,WCO(s)}. (5) 

For all the states in S^, all the video data units in W has been received. Note that, under Assumption 
[3} the video scheduler focuses on transmitting data in W until W C C(s). Thus Sw and S^r form a 
partition of state space S. In other words, we have S w U S w = S and S w (1 S w = 0. 

Given a policy //(•), the system state evolves as a Markov chain in set Sw U Sw- Because the 
transmission rate is finite, the number of states in S^ which can be reached from Sw in one step is 
also finite. We formally define this set of states as follows 

S A = {s|s G S w - 3 s' G S w , s.t, P„(s|s') > 0}, (6) 

where P At (s|s / ) is the state transition probability under policy /i (for the expression for P At (s|s / ), see 
Appendix. [A]). Thus to move from Sw into the set S^, the system state first hits a state in S& and then 
stays in S^ for some time. During this period, the decoded video quality is always J2e=i because 
all the layers in W are available. The evolution of the system when it moves into set Sjy affect the 
performance of the system. In general, the longer it stays in Sj^, the better the performance is. Although 
the scheduling policy in Sw is fixed as described in Assumption [3j the policy in Sw determines how 
frequently the system state will hit S^ and thus critically impacts the system performance. 

In the following, we denote the system under a given policy /i as system Let t^(s) be the expected 
time spent by 11^ in Sw after it enters S^ at state s e 5a. Let P At (s / |s) denote the probability that 
I1 M jumps back to Sw at state s' G Sw after it enters S^ at state s. To find the optimal policy, we 
define a finite-state system fl^ as follows: 
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Definition 1: A system 11^ is called the simplified system of the original system I1 M if it has the 
following dynamics: 

1) The system is a semi-Markov process over state space S = Sw U Sa- In any state s 6 5, the 
visual quality is q(s) as in ([2]). In any state in Sw, the system evolves according to the policy 
/i. The system state transition probability is P M (-|-). 

2) When the system jumps to a state s G Sa, it spends t^(s) slots in s with video quality X]^=i 9^ 
for each slot. The system then transitions to a state s' G <Svp with probability P^s'ls) (see Fig. 
0). 

It should be noted that 11^ is not coupled with the original system It just shares some properties 
with the original system. The following theorem relates the visual quality under I1 M and that of ILj. 

Theorem 1: If the jump chain of the original system 11^ is positive recurrent, then the time-average 
MS-SSIM index of 11^ is the same as the simplified system 11^. 

Proof Sketch: If the jump chain is positive recurrent, the jump from Sw to Sa can partition the 
Markov process into i.i.d segments. We only need to optimize the policy n to maximize the average 
quality in each segment. Every segment consists of two consecutive subsegments. During the first 
subsegment, s t G S w . In the other subsegment, s t G Sw- Because every state in S w has the same 
visual quality X^Li we can abstract the first subsegment as a single state with transition probability 
. This simplified system provides the same average quality as the original system. For a detailed 



proof, see the technical report p0| . ■ 

Remark The positive recurrent condition for the jump chain means that the average throughput of 
the channel is neither too large nor too small relative to the average data rate of the video. If the 
average throughput of the channel is very large, the receiver buffer can always buffer enough frames 
and dynamic scheduling is unnecessary. If the average channel throughput is too small, the channel 
cannot support the video stream and dynamic scheduling cannot help either. 

As indicated by Theorem [lj given any policy /i, the visual quality of 11^ is the same as 11^. Thus, 
we can optimize our policy with respect to 11^ which has a finite-state space, and a standard policy 
optimization algorithm can by applied. 

Before we can apply an MDP algorithm to optimize the policy, we need to compute £ M (s) and 
P At (s'|s) for every state s G Sa and s' G Sw Both i M (s) and P jU (s'|s) only involve dynamics of the 
system in Sw- Details on how to compute t M (s) and IP^s'ls) are found in Appendix B 
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D. Determining Optimal Policy via Value Iteration 

Given t fl (-) and P^-l-), the optimal policy for an MDP can be determined for the simplified system 
which is also the optimal policy of n^. Let s ini be any state in S = Sw U S&- The hitting 
time to state s ini can partition the process into i.i.d cycles. Optimizing the policy /i(-) in the cycles 
maximizes the time-average MS-SSIM index of the system. Similar to the derivation in (2TJ p. 441], 
this is equivalent to an average-reward maximization problem with stage-reward g(s) — i](s)\, where 
A is the expected average-reward of each cycle and 

q(s) : s G Sw 



s 



A: 



T/(S) 



1 : s G S w 
t M (s) : s G 5 A , 

where q(s) is defined in @. Let us denote by h(s) the average reward-to-go in each cycle when the 
system starts at state s. Then we have the following Bellman's equation array: 

h(s) = g( S )-r)(s)\+ Pm(s'|s)/i(s'), (7) 

s'es w us A 

where h(s- mi ) = 0. To find the optimal policy, the standard value iteration algorithm can be applied 



[2TJ p. 430]. 

On the one hand, the assumptions on scheduling policy result in the finite state MDP-based formu- 
lation. On the other hand, the assumptions may render the derived scheduling policy sub-optimal. To 
verify the performance of the scheduling policy derived from the MDP formulation is actually close 
to optimal, we prove a performance upper bound in the next section. 

E. Performance Upper Bound 

In the rest of the paper, let R t = X t (l — Y t ) be the throughput of the channel at time t, where 



X t is the transmission bitrate and Y t is the packet error rate as defined in Section. III-A Since 



the channel condition is modeled as a random process, X t , Y t and R t are random processes and 



we will denote by x t , yt and r t their realizations. As discussed in Section. II-C t q^Zt) and q p (z t ) 
are the rate quality models of I frames and P frames, respectively. Let t( be the indicator that 
the t th frame is an / frame. The time-average MS-SSIM of the transmitted video can be written as 
^ Y^t=i Wi^M + Q P { z t)0- ~ if)] > where n is the number of frames in the video sequence. An upper 
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bound on the performance of any scheduler is given by the following offline optimization problem: 

1 n 

maximize - V [g J (^)l* + <? P (^)(1 - if)] 

7 , (8) 



•t. tE^tE 7 "*' VtG {1,2,--- ,n}, 



t t ■ , 

1 = 1 4 = 1 

where the constraint | 5^* =1 z i < f Y^l=i r « guarantees that the received data for the frames displayed 
before time t does not exceed the cumulative throughput prior to time t. We can further relax the 
constraints in ([8]) by only keeping the last one, i.e., when t = n. The relaxed optimization problem is 
then given by 



1 n 



maximize 

*l:n U 

* _1 (9) 

^ n j n v 7 

s.t. -Vz t <-Vri. 

n z — ' n z — ' 

t=i t=i 



Let g 7 (-2 f ) and q p (z t ) be the concave envelope of q 1 \z t ) and q p (z t ) respectively (see Fig. [3]). Since, 
q^zt) and q p (z t ) are upper bounded by g 7 (.2t) and q p (z t ), we can bound problem (|9]) by: 

j n ^ 

maximize — g / (2 t )l / + q p (z t )(l — I 1 ) 
t=i 

- n 1 n 

s.t. -V^t<-Vr t . 



(10) 



n * — ' n 

t=i t=i 



Let n 7 = X]"=i 1* denote the number of I frames and n p = J2t=i 0- ~ M) denote the number of P 
frames. Since the functions Q I {z t ) and Q p (z t ) are concave, by Jensen's inequality, we have 



t=i \ t=i J 

i n — — / 1 n 

t=l \ t=l 



n 1 

and 

i n / 1 n 

"-/(' ' -') ) • 



Problem ( |T0| ) can then be bounded by: 

, . n 



maximize 

Zl:n n 



n 

s.t. 

n 



(ID 



If the video is reasonably long, e.g. several minutes, the frame number n will be very large. If we 
let n — > oo and assume the channel throughput r t is ergodic, - Y^t=i r * wm conver g e t0 me ergodic 
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capacity r avg = lim n ,_ > . 00 i ^™ =1 r <- Furthermore, since F GO p is the length of the GOP, and 
1 — F 1 are proportion of / and P frames in the video sequences. Thus we have ^- — >■ j, 1 and 
^ — >■ 1 — F * . Similarly, for stationary policies 2 , the limits z 1 = lim^oo Y^t=i z tM an ^ z P — 
lim^oo £ Er=x exist. We have lim^ (£ ELi + V £?=i *(1 - 1*))" = 

Poop z * ^ ~ Fgop ^ zP ' ^ US ' we nave s h° wn me following theorem: 

Theorem 2: For ergodic wireless throughput and stationary adaptive scheduling policies the follow- 
ing optimization gives an upper bound on performance: 

maximize — - — q I (z I ) + ( 1 — — — J q p {z P ) 
zi, zp r GO p \ r G op J 

1 / 1 \ { } 

s.t. — zi + I 1 - — ) zp < r avg . 



Fgop V Fgop 

Since the rate-quality functions g 7 (-) and q p {-) are assumed to be concave, the above optimization 



problem is convex and easily solved. In Section III-F , this upper bound will be employed as a 



benchmark to evaluate the performance of our MDP-based scheduling policy. 

F. Performance evaluation of the MDP-based scheduling policy 

In this section, we evaluate the performance of the policy obtained from our MDP-based formulation. 
Parallel to |6j and p2| , we employ the FSMC channel model proposed in p6| to model the dynamics 
of Rayleigh fading channels. The SNR at the receiver is partitioned into 4 regions using the algorithm 



proposed in [ 16 1. In our simulations, we set average SNR to A avg = lOdB. The MDP-based scheduling 
algorithm was evaluated on test sequences "foreman", "bus", "flower","mobile" and "Paris" p3| . These 
video sequences were encoded using H.264/SVC reference software JSVM p4| with 3 layers. For 
each sequence, 200 transmissions were sent over the simulated channel. A startup delay constraint was 
fixed to 100ms, i.e., video playback began 3 frames after the transmission began. To conceal errors, 
every lost frame was reconstructed by copying the preceding frame. For more details about the FSMC 
channel model and encoding parameter of the video sequences, see Appendix |Cj 

The performance of the on-line scheduling algorithm was tested over the simulated Markov channel 
models with different Doppler frequencies (fd = 5Hz and 3Hz, respectively). The simulation results are 



summarized in Table [Tll| and Table IV The time-averaged MS-SSIM value is converted to Difference 



Mean Opinion Score (DMOS) using the following mapping 

DMOS = 13.3442 ln(l - SSIM) + 3.6226(1 - SSIM) + 77.0117. (13) 



2 A policy is called stationary if it is a function of state s and the function is invariant with respect to time t. 
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Equation ([13]) is obtained by logistic regression using the MS-SSIM indices and MOS values of the 



images in the LIVE database [25 1. DMOS ranges from to 100. Value means perfect visual quality 



and value 100 means bad visual quality Roughly speaking, value 50 means fair quality. It can be seen 



from Table [HI] and Table [IV] that the DMOS value of the MDP-based scheduling policy is worse than 
the performance bound by at most 4, which is visually insignificant. Given that the bound given by 
Theorem [2] is an upper bound (i.e. a lower bound of DMOS value), the MDP-based scheduling policy 
is indeed near-optimal. 

IV. Near-optimal heuristic On-line Scheduling Algorithm 

Although the MDP-based formulation makes it possible to compute a good scheduling policy using 
value iteration algorithm, off-line computation of such policies requires a priori knowledge of the 
channel dynamics. This motivated us to design a simple on-line scheduling policy which delivers 
similar performance as the MDP-based policy that only requires little a priori knowledge about the 
channel dynamics. 

Basically, a good online video scheduling algorithm should explicitly take advantage of the channel 
dynamics and schedule data from different quality layers as a function of the receiver buffer state. 
There are three fundamental questions in desiging such a scheduler: 1) How should one incorporate 
limited knowledge of channel dynamics in adaptive scheduling; 2) How should one determine the 
number of enhancement layers to schedule; 3) How should one allocate appropriate transmission rate 



among X prc , X and jp° st (see definition in III-A). In the following, we will show how to address these 



fundamental problems by reasonably simplifying the MDP-based scheduling algorithm. 

A. Channel Model Simplification 

In a practical wireless communication environment, accurate channel dynamics models such as the 
state transition probability P c are not generally available. Some basic characteristics for the channel 
dynamics can however be easily referred. At any slot t, the instantaneous channel throughput r t = 
x t (l — y t ) can be derived using receiver channel state information feedback. The ergodic channel 
throughput r avg can be estimated by averaging r t over time. Furthermore, the temporal correlation 
coefficient p = ^rr^> can also be estimated from r t . Further it is reasonable to assume the channel 

r cr(Rt)cr(R t +l) 1 

throughput R t will typically regress to the mean r avg . This inspires us to use a simple autoregressive 
model to capture the dynamics of the channel. The simplest model for R t is the first order auto- 
regressive model (AR(1)) as follows: 

Rt - (pRt-i =c + N t , (14) 
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T-1 




d(r t ) = E 




Rt = n 









where N t is an i.i.d random variable with zero mean value. From ([14]), parameter c and </> can be 
estimated as = p and c = r avg (l — p). Thus, we have 

Rt - pRt-x = r mg {l - p) + N t . (15) 

Using this autoregressive model, the amount of data that will be delivered in the next r slots by 
the channel can be estimated as 

YytP> +r avg (l-pi)}. (16) 

3=0 

To obtain an accurate estimate in the near future, we set the length of the window r into the future that 
will be considered to be the relaxation time^of the channel, i.e. , r = |~— (lnp)~ 1_ |. In the following, 
we use this to determine which quality layers to schedule. 

B. Layer Selection 

Given the current channel state, receiver buffer state, and estimated available capacity for a window 
r into the future, the goal is to determine which layers to schedule. We will focus on determining 
the number of enhancement layers which should be scheduled. We denote by L sch (s t ) the number of 
layers to be scheduled if the state is s t . Once L sch (s i ) is determined, the online scheduling algorithm 
only schedules data units from the first L sch (s t ) layers. 

The layer selection scheme for our proposed on-line algorithm is motivated by that of the MDP- 



based policy. Using d{r t ) defined in ( fToj ), we can estimate the amount of data which can be delivered 
in the next r slots. Let F(£, s t ) be the amount of data which is not currently available at the playback 
buffer at time t, and belongs to the first £ layers of the next r frames. The quantities d(r t ) and T(£, s t ) 
summarize the channel and buffer states for the next r slots. Note that T(£ — 1, s t ) < d(r t ) < T(£, s t ) 
means that we can probably transmit all the data up to the £ th layer in the next r slots. Intuitively, 
we can simply choose L sch (s t ) = £ — 1 when Y{£ — l,s t ) < d{r t ) < T(£, s t ). As discussed next, this 
layer selection scheme can be motivated by the near-optimal scheduling policies computed for the 
MDP-based model. 

Note that r t = x t (l — yt) is determined by state s t , thus d(r t ) can also be written as function of 
s t , i.e., d(s t ). Suppose we partition the state space into subsets V 1 = {s 6 S : T(£ — l,s) < d(s) < 



T(£, s)}, £ G {!,••• ,L+ l}n and calculate the fraction of states in V where the MDP-based policy 



3 The relaxation time is defined as the temporal distance at which the temporal correlation coefficient is reduced to \ 
4 We define Y(L + 1, s t ) = +00 
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only schedules the first £—1 layers. As shown in Fig. [6} for 70% of the states of V 1 and V 2 , the MDP- 
based policy only schedules the first layer. For about 65% of the states of V 3 , the MDP-based policy 
only schedules the first 2 layers. Finally the MDP-based policy will schedule all the layers on 65% 
of the states in P 4 . These observations justify our intuition regarding layer selection. In our proposed 
on-line scheduling algorithm, we will simply choose L sch (s t ) = £ — 1 if Y{£ — 1, s t ) < d(r t ) < T(£, s t ). 
In other words, our heuristic algorithm determines L sch (s t ) by roughly estimating the number of layers 
which can be transmitted. 

C. Resource allocation between current and future GoPs 

In each transmission slot, r t bits of video data are delivered to the receiver. In the following, we refer 
to r t as the budget for slot t. Once L sch (s t ) is determined, we still need to determine how to allocate 
this budget among X pre , X and X post . Sometimes it is necessary to transmit data associated with next / 
frame before the data units in the current GoP. For example, when the next / frame is approaching its 
display deadline and its base layer has not yet been received, if we focus on transmitting the frames 
in the current GoP sequentially, this increases the risk that the next / frame can not be decoded before 
its deadline. This in turn would cause severe decoding failures throughout the next GoP. 

We denote by ^ pve (£, s t ) the amount of unreceived data in the first £ th layer of TP™ at state s t . We 
denote by &(£, s t ) the amount of unreceived data in the first £ th layer of X at state s t . We propose 
the following heuristic for allocating the bit budget between X pre and X. In each transmission slot, the 
scheduling algorithm allocates up to Q t = ^p^^q^^cE (sf )>St) of the transmission bit budget 
to X. In other words, the number of bits allocated to X is min(f2 t x r t , \l/ I (L sch (s t ), s t )). 

Here Q t gives the relative importance of the next / frame and current GoP. If ^ I (L sch (Si), s t ) = 0, 
then f2 t = 0%. It is not necessary to transmit any data for the next / frame. If \l/ pre (L sch (si), s t ) = 0, 
then Vl t = 100%. We only focus on transmitting the future GoPs. 

The online scheduling algorithm is summarized in Algorithm [T] 

D. Performance evaluation of the on-line scheduling algorithm 

The performance of the on-line scheduling algorithm was tested over the simulated Markov channel 
models with different Doppler frequencies (fd = 5Hz and 3Hz, respectively). This setting is the same 
as the simulation setting in Section III-F The results are summarized in Fig. [7J As can be seen, 



the performance of the proposed online- scheduling algorithm is almost as good as the MDP-based 
scheduling algorithm. Moreover the online scheduling algorithm's performance is close to the bound 
given by Theorem |2j We conclude it is a near-optimal scheduling algorithm. 
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Algorithm 1 On-line adaptive scheduling algorithm 



Input: St, r avg , r t and p 

i: t= r-(lnp)- 1 ] 

2: loop t 

3: d(r t ) <- X)JZq [rtp 3 ' + r owp (l - />>')] > Channel estimation 

4: for £ = 1 — > L do > Determine L sch (st) 

5: Compute T(£,s t ) 

6: if d(r t ) < T(£,s t ) then 

7: break 

8: end if 

9: end for 

10: if £=\ then 

11: L Sch (s t ) <- 1 

12: else 

13: L sch (s t ) <-£-l 

14: end if 

15: Compute ^ pro (L sch ,s t ) and ^\L sch ,s t ) 

17: Schedule min(fi 4 x r 4 , ^(L 80 * 1 , s t )) bits from I. > Scheduling data 

18: Schedule r t - min(^ x r t , ^ l (L sch , s t )) bits from X pre and X post . 

19: end loop 



According to our MDP model in Section [TIT], the MDP-based optimal scheduling policy is supposed 
to be optimal among all considered scheduling policies. However, for the test video sequence "flower", 
the online- scheduling algorithm even outperforms the MDP-based policy. That is due to the fact that 



the MDP-based scheduling policy is derived based on the rate quality model in Section II-C which 
assumes average rate quality characteristics of all frames. The online algorithm schedules data units 
according to T(£, s t ) using actual size of data units rather than the rate distortion model. Therefore, 
the online algorithm tends to estimate the buffer states more accurately, thus may result in better 
performance. 

We have also tested the performance of the online algorithm without bit budget allocation between 
current and future GoPs. As can be seen, the performance is worse than the MDP-based scheduling 
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policy and the performance bound. This motivates the necessity of allocating bit between current and 
future GoPs. 

V. Conclusions 

We have developed adaptive scheduling algorithms for efficient scalable video transmission in 
wireless channels. By modeling the wireless channel as a Markov chain, an MDP model is proposed 
in which policies that maximize the visual quality predicted by MS-SSIM index can be computed. 
By simplifying the scheduling algorithm obtained from the MDP formulation, we propose an online 
scheduling algorithm which only requires limited knowledge of channel dynamics. Simulation results 
demonstrate the near-optimality of the proposed online scheduling policy versus a proposed bound on 
performance. 

Appendix A 
Transition Probability 

Notations: Let 1 is the unit vector of all-ones and is the zero vector. max{a, b} and min{a, b} 
are the componentwise maximum and minimum of vector a and b, respectively, l(-) is the indicator 
function. 

Let s t = (c t ,v t ) and U St be the system state and the corresponding feasible control set at slot t, 
where c t = (x t ,yt) and v t = (vj? rc , v(, v^ ost ). At the beginning of each slot, one frame is decoded 
and played out. We let v+ = (v^ rc+ , vf + , v^ ost+ ) denote the buffer state right after the first frame is 
displayed. If // = 1, i.e., the decoded frame is an / frame, then the frame set X becomes the next / 
frame, i.e., the F G h OP frame. Hence, the buffer state is 

v t — Fgop, Fgop) , (17) 

where J2e=i l(^° st > F GO p) is the number of received layers in the next / frame. Meanwhile, X prc 
becomes the first F GOP — 1 frames and Z post contains the frames whose index is larger than F GOP . 
Thus, we have 

vf' c+ = minjvr, (Fgop -1)1} (18) 

and 

vr t+ = max {vr st - FqopI, 0} . (19) 

If the decoded frame is not an I frame, the frame set I post will not be affected and the buffer state 
v^ ost does not change. vj? rc becomes 

vP rc+ = max{v t prc -l,0}. (20) 
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pre+ 



Summarizing ( [17] ), ( |T8| ), ( fT9| ) and ([20]), we have 

min {vr st ,(F GOP - 1)1} if// = l, 

max {vf -1,0} if ///l, 

(iW,£tii(r st >iW)) if// = i, 



and 



postH 



(//-Mf) if// t^I, 

max{vf ,8t -F GO pl,0} if // = 1, 

if // ^ 1. 



After the first frame is displayed, the transmitter begins to sequentially transmit the collection of video 
data units indicated by the action U t = pi(s t ) = {(/i, £i), • • • , (f\u t \J\u t \)}- Let AW 4 = {(fiji), 
{fnt An t )} denote the completely received data units by the end of the slot, where n t is the number 
of received data units. Among the data units in AU t , let Av^ re = (A6 pre , Ab^, • • • , Ab^ e ) be the 
number of newly received data units for each layer in frame set X prc . Similarly, we denote Av]? ost = 
(A6 post , A&2° s \ ■ ■ ■ , A6^ ost ) as the number of newly received data units for each layer in frame set 
X post and At 1 as the number of received data units for X. At the beginning of the (t + l) th slot, we 
have the following state transition relationship 

v t Ti = vf C+ + Avr, (2D 
v^ 1 = (//+4 + + Af r ), (22) 

v post = v post+ + Av post_ (23) 

The amount of video data in AU t , denoted by $(v t , AUt), can be estimated according to buffer state 



vf and the rate-quality model introduced in Section II-C Specifically, for each data unit in AUt, we 
first determine whether it belongs to an I frame or a P frame according to vf and then estimate the 
amount of data by the rate-quality model. The set AU t records the completely transmitted data units 
up to (f nt Ant) th data unit. However, data unit (f nt+ i,£ nt+ i) is only partially received. Denoting the 
amount of data in unit (f nt +i, i nt +i) by ^( v ;i AW t ), the amount of received data is at least $(vf , AU t ) 
and at most $(vf , AUt) + $(v t ; , AW t ). Assuming the physical layer packet length is Lphy, there is 
N = \ L xt H 1 packet transmissions during a time slot. The number of successfully transmitted packets 
is at least N t = and is less than N h = r£M^l±«M^)l A s assumed in Section 



II-D 



the channel state is constant over each slot. Thus, the packet losses are independent within each slot. 
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The number of successful packet transmissions in a slot is distributed binomially. Hence, the state 
transition probability from s t = (c t , v t ) to s t+ i = (c t+ i, v t+ i) is 

P M (s t+1 |s t ) = 

where the first multiplicative term is the transition probability of the receiver buffer state from v 4 to 
v t+ i and the second term is the transition probability of the channel state from c t to c t+ i. 

Appendix B 
Computation of and P m 

Let v Y be the number of buffered packets outside the window. Noted that, when the system moves in 
S w , the system always schedules as many enhancement layer data units as possible. Hence, vf and // 
contain all the information about the buffer state v t . We can further simplify the state representation 
(c t , v t ) to (c t ,vY,fl) when (c t ,v t ) G S w . All the states in S w correspond to some states with 
vY > 0. All the states in S w correspond to some states with vf < 0. 

When the system evolves in S w , at the beginning of a slot t, the state first decreases by 
Ai^ c (s t ) when the current frame is displayed. Then, the transmitter schedules as many enhancement 
layer data units as possible. At the end of the slot, increases by Ai^ c (s t ). Because the quantity 
Av w (s t ) = AvY c (s t ) — Avfe c (s t ) only depends on state s t , the state varies like a random walk 
but with Markovian step-size Av w (s t ). Now we need to compute, given the starting state s t with 
vY > 0, how long it takes to jump to a state where vf < 0. Let AvY ax = max ses w {\Av w (s)\} be 
the maximum step size. We define the kth level set as 

S k = C x {kAv% ax + 1, • • • , (* + 1) *vZJ x {1, • • • , F GOP }, (25) 

where k >= —1. After the system moves to S w , the system state transits in level set Sk with k > 
and then jumps to a state in S-±. If we concatenate all these level set as Sfc = U^_ ll S fc , then the state 
transition matrix is a banded infinite matrix of the following form 

B Bi 

A Ax A 2 

A Ai A 2 

A Ai '•• 



N h -1 

E 

.nt=N, 



Vt 



N-nt, 



i - ytY 



P(c t ,c m ), 



(24) 
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The blocks B , B 1; A , Aj and A 2 are all square matrices of dimension d = \C\Av^ ax F GO p. When 
the system state jumps to S&, the state lies in level set So. Now we need to compute how long 
the system takes to reach «S_i for the first time. This problem of continuous time quasi-birth-death 
processes was essentially solved by Neuts in the early 1980s p6j . The following derivation follows 
similarly the one in Neuts' book but for the discrete time case. 

Let G u (k,x)jji be the probability that, starting from the jth state in level set S n , the system state 
hits level set S( n _ v ) for the first time after x movements in which there are k left movements. The 
hitting point is the j'th state in Sr n - V ). Let G u (k, x) be the dx d matrix with G u (k, x)jj> as the (j,f) 
entry. Applying the Z transform to this distribution we have 

oo / oo \ 

G^(z,s) = £V [J2G(k,x) {u) s x ). (26) 

fc=0 \x=0 J 



Denoting G {1 \z,s) by G(z,s), it can be proved that G (u) (z, s) = (G(z,s)) v |26J. Conditioning on 
the state visited in the first state transition, we have 

G(z, s) = A zs + A ± G(z, s)s + A 2 G 2 (z, s)s. (27) 

Now, define G = 6(1, 1), C = (I- A^Aq, C x = (I - Ai)" 1 and C 2 = (I - A 1 )~ 1 A 2 . By simple 
matrix manipulation of ( [27] ), we have 

G = C + C 2 G 2 . (28) 

We can compute G by successive substitutions starting with the zero matrix. Because G = G(z, s)| 2= i ja =i, 
its entry Gjj> is actually the conditional probability that, given the initial state j 6 5 , the system will 
move to <S_i for the first time at the j'th state. Hence, we can find P M (-|-) from G. 

Let M = dG d g [z=i,s=i- The (j,j')th entry Mjji = Y.kLoY17=o( x( ^j,j'( k ^ x )) is tne conditional 
expectation of the time that, given the initial state j 6 So, the system takes to hit «S_i for the first 



time at the j'th state. Differentiating both sides of ( [27] ) with respect to s, then setting s = 1, z = 1, 
we have 

M = A + AiM + AiG + A 2 G 2 + A 2 (GM + MG). (29) 



Using the definition of Ci and C2, equation ([29]) can be simplified as 

M = CiG + C 2 (GM + MG). (30) 
We can compute M by successive substitutions starting with the zero matrix. We define a vector 

t = Ml. (31) 
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The jth entry tj = J2f'=o Y^kLo Y^Lo ( x ^*j,j'(ki x )) * s tne conditional expectation of the time that the 
system takes to go back to <S_i given the initial state j G So, i.e., t^(j). Therefore, we can compute 



tp and P M using (|28]> and 431) . 



Appendix C 
Simulation Settings 



We employ the FSMC channel model proposed in [16| to model the dynamics of Rayleigh fading 



oo. Let A fc be the representative SNR 



channels. The SNR at the receiver is partitioned into \C\ regions using the algorithm proposed in [[T6|. 
Let Aj be the partition thresholds, where A = — oo and A|c| 
in the k th region. For Rayleigh fading channels, we have 

r A 



A. 



(32) 



where p(X) 



A-av< 



exp( 



is the probability distribution function of the received instantaneous 



SNR of Rayleigh fading channels with average SNR A avg . According to [16|, the state transition 
probability P c is computed as 



K.(Aj)AT 



IC(Aj)AT 



K,(Aj)AT K,(Aj)AT 



if J = I + 1, 
if j = { - 1, 

if j = i, 
otherwise, 

is the level crossing rate of threshold Aj 



0.423// d . In our 



where vr, = p(A)dA. /C(A) = ^^/,ex P ( ^ 
where f d is the Doppler frequency. The coherence time is estimated via t c 
simulations, we set \C\ = 4 and A avg = lOdB. 

We assume that BPSK, QPSK and 8PSK are used for modulation. The symbol error rate p s k in 
the k th SNR region is p s k = 2Q(\Z2A~ k sm where M = 1,2,3 for BPSK, QPSK and 8 PSK, 
respectively. Each packet contains 2048 symbols. Thus, the packet length L PHY = 2048 x M, where 
M — 1,2 and 3 for BPSK, QPSK and 8PSK, respectively. The transmission time for each packet is 



At = 1.5ms. The transmission data rate is given by %k 



AT 
At 



Lphy- The packet error rate is given by 



Vk 



■vi 



s \ 2048 



. The modulation scheme for k th channel states is chosen such that the throughput 



Xk(l — yk) is maximized. 

The proposed dynamic scheduling algorithm was evaluated on the test sequences "foreman", "bus", 



"flower","mobile" and "Paris" [23 1. These video sequences were encoded using H.264/SVC reference 
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software JSVM p4| with 3 layers. The GOP length was fixed at F GO p = 16. The encoding parameters 



and rate-quality model parameters are listed in Table pj The parameters rj and rf are measured in 




megabits and q is measured in MS-SSIM index values. The quantization parameters (QP) were 
chosen such that the the data rate of the base layer is lower than the average channel throughput. 
The Lagrangian multipliers for motion estimation and mode decision were set as QP — 2. We employ 
this configuration to make sure that the channel is at least good enough to support the base layer. 
Otherwise, any scheduling policy cannot provide acceptable visual quality. 
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TABLE I 

FREQUENTLY USED NOTATION. 



Notations 


Descriptions 


Fgop 


Number of frames in a GOP. 


L 


Number of layers. 


Zt 


The amount of received data for the frame that is played out in the t th slot. 


u)£ and u)\ 


The amount of data in the £ th layer of a P and an / frame, respectively. 




The visual quality increment when the £ th layer is correctly received. 


q'izt), q P (zt) 


The rate-quality model for / frames and P frames. 


q P (zt) 


The concave envelopes of Q 1 (z t ) and Q F (z t ). 


X t and Yt 


The transmission bit rate and the packet error rate at the t th slot. 


x t and y t 


The realizations of Xt and Yt. 


Rt,r t 


Rt = Xt(l — Yt) is the channel throughput, which is random. r t is a realization of R t . 


Tavg 


The average value of the channel throughput r t over time. 


C u V t and S t 


The channel state, buffer state and system state at the t th slot. 


c t , v t and s t 


The realizations of Ct, Vt and St. 



TABLE II 

The encoding parameters and rate-quality model parameters of the tested sequences. 



sequences 


Layer 1 (base layer) 


Layer 2 


Layer 3 


QP 






q 1 


QP 




p 


q 2 


QP 






q 3 


foreman 


31 


0.0612 


0.0149 


0.9362 


27 


0.0638 


0.0245 


0.0222 


26 


0.0228 


0.0290 


0.0045 


bus 


39 


0.0565 


0.0140 


0.8408 


35 


0.0581 


0.0203 


0.0644 


33 


0.0329 


0.0297 


0.0277 


flower 


40 


0.0846 


0.0130 


0.9117 


36 


0.075 


0.0225 


0.0400 


35 


0.028 


0.0268 


0.008 


mobile 


40 


0.0972 


0.0133 


0.8839 


37 


0.0782 


0.0222 


0.0408 


36 


0.0309 


0.0284 


0.0121 


Paris 


33 


0.1121 


0.0146 


0.9487 


28 


0.1014 


0.0209 


0.0241 


27 


0.0387 


0.0206 


0.0041 



TABLE III 

The performance of the near-optimal policy in SSIM-predicted DMOS. fd = 5. 





Paris 


mobile 


flower 


bus 


foreman 


MDP-based Policy 


33.5361 


46.5898 


39.8506 


47.3337 


36.8433 


Upper bound i| 1 2fr 


33.4279 


44.7976 


38.0841 


46.8378 


36.3160 
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TABLE IV 

The performance of the near-optimal policy in SSIM-predicted DMOS. fd = 3. 





Paris 


mobile 


flower 


bus 


foreman 


MDP-based Policy 


33.8628 


44.9776 


42.8653 


48.2840 


36.6403 


Upper bound i| 12fr 


33.4279 


44.7941 


38.0925 


46.9338 


36.3536 



Channel and Receiver Buffer State 



Requests 




Requested 
Data 



£ 



Scheduler 



Transmitter 



1 Wireless Channel i 
» / 



Receiver 



Fig. 1. Dynamic scheduling system for wireless video transmission. 
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Fig. 2. Encoder prediction structure when L = 3. The prediction order is indicated by arrows. The data unit index (/, I) is also shown 
on each data unit. 
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Fig. 3. An illustration of the rate-quality function qj(zf) for the / frame. The rate-quality function qf(zf) is piecewise constant 
and right-continuous (solid). Its concave envelope cjf(zf) is also shown (dotted). 




Fig. 4. An illustration of the receiver buffer state when F G op = 8, L = 3. vf c = (4, 2, 1), vj = (6, 2), vj? ost = (3, 3, 1). 
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] percentage of states that only schedule 1st layer 
] percentage of states that schedule first 2 layers 
I percentage of states that schedule all 3 layers 
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(a) / d = 5#2. 
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